Method for haplotyping by mass spectrometry

ABSTRACT

The invention relates to a method for performing haplotyping of multiple single nucleotide polymorphisms (SNPs) that uses allele specific PCR and mass spectrometry analysis.

[0001] The invention relates to a method for performing haplotyping of multiple single nucleotide polymorphisms (SNPs) that uses allele specific PCR and mass spectrometry analysis.

[0002] The complete sequence of the human genome will be achieved and completely published in the next few months. This project will reveal the complete sequence of the 3 billion bases and the relative positions of all estimated (from 30.000 to over 100.000) genes in this genome. Having this sequence opens numerous possibilities for the elucidation of gene function and interaction of different genes.

[0003] It also allows the implementation of pharmacogenetics and pharmacogenomics. Pharmacogenetics and pharmacogenomics aim at a targeted use of medication dependent on the genotype of an individual and so the dramatic improvement of the efficiency of drugs. A necessary intermediate step to this is the determination of variability of different individuals on a genome basis. This is accomplished by determining different markers and then using these for genotyping (characterization of the presence of a marker in an individual) and haplotyping (linkage between different markers in close proximity).

[0004] Currently two kinds of markers are used for genotyping: microsatellites and single nucleotide polymorphisms (SNPs).

[0005] Microsatellites are highly polymorphic markers where different alleles are made up of different numbers of repetitive sequence elements between conserved flanking regions. On average a microsatellite is found every 100.000 bases. A complete map of microsatellite markers covering the human genome was presented by the CEPH (Dib et al., Nature Mar. 14, 1996;380(6570):152-4). Microsatellites are commonly genotyped by sizing PCR products generated over the repeat region on gels. The most widely used systems are based on the use of fluorescently labeled DNA and their detection in fluorescence sequencers.

[0006] Fewer SNPs are currently in the public domain. A SNP map with 300.000 SNPs is being established by the SNP consortium (Science, 1999, 284, 406-407).

[0007] For genotyping SNPs, there are a few methods available for the person skilled in the art, all of them with advantages and disadvantages.

[0008] Some of these methods rely on gel-based detection, like the oligonucleotide ligase assay (OLA), and for this reason only allows medium throughput applications.

[0009] Others rely on pure hybridization which is not as discriminating and is difficult to tune to get the high stringency required (oligonucleotide arrays, DNA chips). Although DNA chips are well suited for simultaneous genotyping of a large number of genotypes in a very limited region of the genome and on an overseeable number of individuals, the main problem seen with the use of these objects is the difficulty to optimize the hybridization conditions (in particular for the stringency).

[0010] Approaches using primer extension and detection by fluorescence have been shown. Their advantage is facile emission detection in an ELISA type reader. The limitation of these methods is the limited number of fluorescent dyes available, which in return limits the number of sample that can be simultaneously analyzed.

[0011] Several methods of SNP genotyping use mass spectrometric detection, as mass spectrometry allows for very high throughput and at the same time gives added information on the base that is present through the mass of the obtained product. In applications where an allele specific product is measured this is direct information and therefore very strong.

[0012] Several methods using mass spectrometry have been proposed for SNP genotyping (WO98/23774, U.S. Pat. No. 5,843,669, these documents being incorporated herein by reference).

[0013] Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI) allows the mass spectrometric analysis of biomolecules (Karas and Hillenkamp Anal. Chem. 60, 2299-2301 (1988)). Indeed, MALDI has been applied to the analysis of DNA in variations that range from the analysis of PCR products to approaches using allele specific termination to single nucleotide primer extension reactions, sequencing and hybridization (U.S. Pat. No. 5,885,775, WO96/29431, U.S. Pat. No. 5,691,141, WO97/37041, WO94/16101, WO96/27681, GB2339279, all incorporated herein by reference).

[0014] Major drawbacks of these approaches are that they heavily rely on stringent purification procedures prior to MALDI analysis that do not lend themselves to easy automation and make up a major part of the cost. Spin column purification and/or magnetic bead technology and reversed-phase purification are frequently applied.

[0015] Indeed, the analysis of nucleic acids by MALDI is strongly dependent on the charge state and a 100-fold increase in analysis sensitivity can be achieved when the DNA is conditioned to carry one positive charge. Such modified DNA products are also significantly less susceptible to adduct formation and so do not require purification procedures (WO 96/27681, GB 2339279, Gut and Beck (1995) Nucleic Acids Research, 23, 1367-1373, Gut el al. (1997) Rapid Commun. Mass Spectrom., 11, 43-50, all these documents being incorporated herein by reference).

[0016] An assay developed from this for the generation of allele specific products for SNPs has been termed the “GOOD Assay” for SNP analysis (Sauer et al., Nucleic Acids Research, 2000, 28, E 13, which is incorporated herein by reference).

[0017] Nevertheless, the genotyping information on its own does not allow full assessment of the translation of a DNA sequence into a protein or the regulation of the transcription. In particular, when the two alleles of a given genes carry different SNPs, it is very important to have information about the combination of different SNPs in relation to each other (haplotyping), and about which of the alleles are on the same DNA strand.

[0018] There are a few methods for haplotyping. They rely on the generation of allele specific products by allele specific PCR, using a primer whose 3′ end base specifically matches one allele to be amplified. Yet, they are limited in their capacity to query multiple positions simultaneously. The presence or absence of a PCR product is used for the identification of a haplotype.

[0019] To increase the specificity of allele specific PCR, two major approaches are taken. One is the addition of GC rich tails to the 5′ end of the primers for the PCR and doing the PCR with a high annealing temperature (Liu et al. (1997), Genome Res 7(4): 389-98, which is incorporated herein by reference). For initial cycles of the PCR high stringency is so obtained. In later cycles the GC tail provides a preference for the amplification templates. However, this does not give sufficient stringency in all cases.

[0020] Another way to increase the stringency of the allele specific PCR is the introduction of further mismatches (Newton et al. (1989), Nucleic Acids Res 17(7): 2503-16, which is incorporated herein by reference). This method on its own also may give limited stringency.

[0021] It may therefore prove interesting to combine these two methods for increasing the specificity and stringency in the allele specific PCR reaction.

[0022] Nevertheless, the allele specific PCR (with or without improvements) has the disadvantage that it only can query two polymorphisms in relation to each other.

[0023] Clark et al. (1998, Am. J Hum. Genet., 63, 595-612) describe a method for the analysis of nucleotide-sequence variation in the Human Lipoprotein Lipase that uses sequencing of the genes and of the allele specific PCR products as the method of analysis for genotyping. The authors develop on the weaknesses of this method of haplotyping, in particular as a lot of effort is required.

[0024] It is the aim of the invention to provide a simple and high throughput method for haplotyping, that allows determination of linkage of multiple SNPs in a fast, cost-efficient and reliable way. The method of the invention allows the simultaneous analysis of multiple polymorphous sites, after performing only one allele specific PCR reaction.

[0025] Indeed, it is often required to determine the alleles of more than two single nucleotide polymorphisms by genotyping. If it turns out that the individual genotype is heterozygous for more than one of the SNPs, it is interesting to determine which of the alleles are on the same DNA strand.

[0026] The invention uses allele specific PCR for amplification of only one allele from the genomic DNA. The allele specific primer is designed to match one allele of a heterozygous SNP. The product of amplification is then genotyped which reveals allows to deduct what the other alleles are on this product and allows the determination of the haplotype, as the previously heterozygous SNPs now appear homozygous.

[0027] The association of the polymorphism underlying the allele specific PCR with the determined alleles of the alleles of the other polymorphisms give the haplotype.

[0028] The invention is therefore drawn to a method for the determination of the haplotype of an individual, comprising the steps of:

[0029] a) genotyping of more than two single nucleotide polymorphisms (SNPs) by mass spectrometry;

[0030] b) allele specific PCR with one primer being specific for one allele of a heterozygous polymorphism, if more then one polymorphisms is heterozygous;

[0031] c) genotyping on the allele specific PCR product by mass spectrometry.

[0032] The method of the invention could also be used to identify nearly identical sequences in order to find out whether a sequence is duplicated or heterozygous. The variations can be used to generate “allele specific” products if other polymorphisms that were heterozygous in the initial genotyping remain heterozygous it is clear that a sequence is duplicated. If the second round genotyping of this systems results in all homozygous SNPs it is probable that the sequence that is being studied is not duplicated.

[0033] The use of mass spectrometry allows to perform the analysis of a large number of samples, and obtain the corresponding data in a multiplex reaction. Therefore, the method of the invention that is characterized by a combination between allele specific PCR and the use of mass spectrometry for genotyping and data analysis can be used at high throughput. It can also be automated and will allow an easy and quick determination of the SNP profile of the patients. It will therefore allow the full implementation of pharmacogenetics and pharmacogenomic and improved use of the data obtained from the genome sequencing project.

[0034] The genotyping of the SNPs in steps a) and c) is performed by mass spectrometry after generation of allele specific products, which can conventionally be obtained by primer extension, oligonucleotide ligation, cleavase reaction.

[0035] One of the advantages of the method according to the invention is the possibility to perform the analysis of multiple SNPs in a DNA sample at the same time in a multiplexed reaction, as known by the person skilled in the art, by choosing the appropriate conditions.

[0036] In order to perform the allele specific PCR reaction of step b), one would use a primer that matches one allele of an heterozygous SNP, and preferably a primer that specifically hybridizes with the heterozygous SNP that is located at the most 5′ or the most 3′ location of all tested SNPs. The other primer would hybridize both alleles and be located such as to obtain the amplification of the region containing all heterozygous SNPs.

[0037] In a preferred implementation of the invention, allele specificity for the PCR amplification is achieved by the 3′ end base of the primer. This base is chosen to match one allele and not the other. Further specificity can be achieved by using a primer that has between 10 and 25 bases complementary to the sequence of the genomic DNA, most preferably between 15 and 20 or 22, the specificity being obtained by the 3′ end base, as described.

[0038] One could also include an unspecific CG rich tail on the 5′ end of the primer, and/or further mismatches before the 3′ end allele specific base, as described. More preferably, the primer has one mismatch more than 3 bases away from the 3′ end.

[0039] The annealing temperature is chosen critical (higher than the calculated melting temperature). In the first rounds of the PCR only the fully complementary sequence can anneal. Once some rounds of PCR have been achieved, the higher annealing temperature due to the GC rich tail ensures majority amplification of a single allele.

[0040] Mass spectrometry is used for this procedure as is well suited to the analysis of up to several tens of polymorphisms and is very facile in operation. Full automation of the sample preparation is therefore possible by this method. Depending on the sample preparation procedure used for mass spectrometric genotyping, this technology is very effective.

[0041] In a preferred implementation of this invention, the method performed for one or both the genotyping steps (a) and c)) uses primers that are chimeric in nature, and the procedure followed is the GOOD assay described by Sauer et al. (op. cit., which is incorporated herein by reference).

[0042] In a preferred implementation of the invention wherein matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI) is used for the analysis of the genotypes. In another embodiment, electrospray ionization mass spectrometry is used for the detection.

[0043] In a preferred implementation of the invention the reagents for the initial genotyping are the same as the ones used for the genotyping after allele specific PCR.

[0044] This invention provides a facile procedure for determining haplotypes that is cost efficient, highly reliable and that can easily be automated, and so lends itself to high-throughput.

[0045] This streamlined procedure makes use of the potential of a highly parallel preparation of products for genotyping, their conditioning so that they require no purification and the potential of mass spectrometers to distinguish large numbers of products simultaneously in one spectrum and being able to record a single spectrum in a few seconds. This invention outlines possibilities to dramatically solve the problems for haplotyping a large number of SNPs as currently encountered in the art and makes streamlined and efficient SNP genotyping possible.

[0046] The invention further relates to a kit for the implementation of haplotyping by the method of the invention, that comprises primers for PCR that generate allele specific products. The kit of the invention may also include the reagents to perform steps a) and c) of the procedure (generation of samples to be analyzed by mass spectrometry for genotyping), and the instructions as how to perform the method of the invention.

DESCRIPTION OF THE FIGURES

[0047]FIG. 1 describes the principle of the method of the invention, applied for 4 SNPs, two of them being heterozygous. The first genotyping step (1.A) leads to the generation of 6 products as determined by mass spectrometry. Allele specific PCR (1.B) leads to the amplification of the paternal strand. Genotyping of this product (1.C) allows the identification of the SNPs that are present on this allele.

[0048]FIG. 2 shows typical mass spectrometer spectra obtained after genotping and haplotyping for SNPs 298 and 390 (FIG. 2.A) and 325 and 423 (FIG. 2.B) of the beta-2 adrenergic receptor gene. The top spectra shows the genotyping result, while the middle and bottom spectra show the results obtained respectively for haplotype 1 and 2, after allele specific PCR.

EXAMPLE

[0049] The example illustrate the method of the invention and can easily be generalized by the person skilled in the art, for other genes.

[0050] The example shown here is the haplotyping of 4 published SNPs in the β2-adrenergic receptor gene. The SNP are T/C at 298, C/T at 325, G/A at 390 and G/C at 423. For each SNP, genotyping by the GOOD assay was established on an amplified (with primers SEQ ID No 1 and SEQ ID No 2) fragment of the genomic DNA. The genotyping was performed by following the method of Sauer et al., (Nucleic Acids Research, 2000, 28, E 13, which is incorporated herein by reference), with primers SEQ ID No 5 to SEQ ID No 8, for the primer extension. The analysis is done in positive ion mode on a MALDI mass spectrometer.

[0051] In a first experiment the genotype for all four SNPs is determined. The genotype for SNP 298 and SNP 390 is shown on the panel of FIG. 2.A, while the genotype for SNP 325 and SNP 423 is shown on the panel of FIG. 2.B. The m/z observed for the products are in the range of 1400 to 1500 Da (FIG. 2, top spectra).

[0052] The data shows that the individual, whose DNA is tested, is heterozygous for the four SNPs.

[0053] In order to determine the haplotype, allele specific PCR reactions are carried on, using the allele specific primers SEQ ID No 3 or SEQ ID No 4, in combination with the primer SEQ ID No 2.

[0054] Primers SEQ ID No 3 and SEQ ID No 4 are specific of one allele of SNP 298 (which is the most 5′ of the heterozygous SNPs), and further carry a GC-rich tail, and a mismatch located 5 bases from the 3′ end of the primers.

[0055] The allele specific PCR could also have been carried out with primer SEQ ID No 1 and primers that are allele specific for SNP 423 (the most 3′ of the heterozygous SNPs).

[0056] Genotyping is performed on the allele specific products, (FIG. 2, middle spectra for haplotype 1, bottom spectra for haplotype 2), by using the same method as before.

[0057] It is clear that the two haplotype obtained add up to the genotype of the individual, and the data allows the determination of the complete haploytpe of the tested individual. SNP 298, −47 SNP 325, −20 SNP 390, 46 SNP 423, 79 (C/T) (C/T) (A/G) (C/G) DNA MALDI MALDI MALDI MALDI Sample Genotype TC TC GA GC Haplotype C C G G 1 Haplotype T T A C 2

[0058] The PCR reaction is performed with classical conditions, with 0.5 μl genomic DNA (50 ng/μl), 0.5 μl of each primers SEQ ID No 1 and SEQ ID No 2 (7.5 pmol/μl) and the cycling conditions:

[0059] 1. 95° C. 2 min

[0060] 2. 95° C. 20 sec

[0061] 3. 68° C. 30 sec

[0062] 4. 72° C. 30 sec

[0063] repeat steps 2 to 4, for 35 times.

[0064] Primer extension reactions are classically performed using 1 μl of the copy primer (SEQ ID No 5 to SEQ ID No 8) (25 pmol/μl), with the cycling conditions:

[0065] 1.95 C 3 min

[0066] 2. 95° C. 10 sec

[0067] 3.58° C. 30 sec

[0068] 4.72° C. 15 sec

[0069] repeat steps 2 to 4, for 35 times

[0070] Phosphodiesterase digest is performed by adding 1 μL acetic acid (0.5 M) and 3 μl PDE are added and incubation at 37° C. for 80 min.

[0071] Alkylation is performed by addition of 45 μl acetonitrile, 15 μl triethylamine/CO2 buffer (2 M, pH 8.0) and 14 μl MeI, and incubation at 40° C. for 25 min. A sample of 20 μl is taken and mixed with 45 μl of 40% acetonitrile.

[0072] MALDI analysis is performed with α-cyano cinnamic acid methyl ester in acetone spotted onto the target, and 0.5 μl of the sample spotted onto the matrix.

[0073] Analysis is done in positive ion mode on a MALDI mass spectrometer.

[0074] Allele specific PCR is performed using either primer SEQ ID No 3 or SEQ ID No 4 and SEQ ID No 2, following the same classical conditions as previously described.

1 8 1 19 DNA Artificial Sequence Description of Artificial Sequence Primer 1 ctcgcgggcc cgcagagcc 19 2 24 DNA Artificial Sequence Description of Artificial Sequence Primer 2 gttggtgacc gtctgcagac gctc 24 3 26 DNA Artificial Sequence Description of Artificial Sequence Primer 3 gcgggcgggg cgccgtgggt cagccc 26 4 26 DNA Artificial Sequence Description of Artificial Sequence Primer 4 gcgggcgggg cgccgtgggt cagcct 26 5 16 DNA Artificial Sequence Description of Artificial Sequence Primer 5 ccgccgtggg tccgcc 16 6 17 DNA Artificial Sequence Description of Artificial Sequence Primer 6 tcttgctggc acccaat 17 7 17 DNA Artificial Sequence Description of Artificial Sequence Primer 7 cgcgcagtct ggcaggt 17 8 18 DNA Artificial Sequence Description of Artificial Sequence Primer 8 gaccacgacg tcacgcag 18 

1. A method for the determination of the haplotype of an individual, comprising the steps of: d) genotyping of more than two single nucleotide polymorphisms (SNPs) by mass spectrometry; e) allele specific PCR with one primer being specific for one allele of a heterozygous polymorphism, if more then one polymorphisms is heterozygous; f) genotyping on the allele specific PCR product by mass spectrometry.
 2. The method of claim 1, wherein the genotyping of the SNPs in step a), step c) or both steps is performed after generation of allele specific products, said generation of allele specific products being done by primer extension, oligonucleotide ligation, or a cleavase reaction.
 3. The method of claim 1 or 2, wherein the genotyping for multiple SNPs in step a), step c) or both steps is performed in one reaction in a multiplexed procedure.
 4. The method of any of claims 1 to 3, wherein the allele specific PCR reaction in step b) is achieved by choosing one primer that matches one allele of a heterozygous SNP.
 5. The method of claim 4, wherein said primer is chosen as to specifically hybridize with the heterozygous SNP located at the most 5′ or the most 3′ location of all tested SNPs.
 6. The method of any of claims 1 to 5, wherein at least one primer used for the allele specific PCR is fully complementary to the sequence of one allele.
 7. The method of any of claims 1 to 6, wherein the 3′end base of the allele specific primer specifically matches one allele of the heterozygous SNP.
 8. The method of any of claims 6 to 7 in which said allele specific primer has 10 to 25 bases that are complementary to the sequence of said one allele of the genomic DNA.
 9. The method of any of claims 4, 5, 7 and 8, wherein said allele specific primer has a 5'tail that is rich in G and C.
 10. The method of any of claims 4, 5, 7 to 9 wherein said allele specific primer has one mismatch in the complementary sequence more then 3 bases away from the 3'end.
 11. The method of any of claims 1 to 10 wherein matrix-assisted laser desorption/ionization time-of-flight mass spectrometry is used for either of or both steps a) and c) as defined in claim
 1. 12. The method of any of claims 1 to 11 wherein the primers used for generation of the products detected in the genotyping in steps a) and/or c) are chimeric in nature.
 13. The method of claim 12, wherein the GOOD assay is applied for either of or both the genotyping steps.
 14. The method of any of claims 1 to 13 wherein electrospray ionization mass spectrometry is used for either or both steps a) and c) of claim
 1. 15. Kit for the implementation of haplotyping by the method according any of claims 1 to 14 comprising primers for PCR that generate allele specific products. 